How does my voice get recognized and converted to text ?



This can take place in two or three level / stages...

LEVEL 1

When you say any word a digital processing of your sample(samples) occurs from the start of the sample to the End of the sample as per the kept sample processing resolution. That is, if your captured word is 12000 bytes length, and you keep a sample processing resolution of 2000  then 12000/2000 = 6 Engine Categories will be produced in sequence from start to end
These categories are available on these settings...
VoiceAction1.Enable_Speech_Engine = "YES"
VoiceAction1.Explode_out_all_Engine = "NO"
VoiceAction1.Keep_Engine_Frequency_Input_Flow = "Pass All"
YourLabel.Caption= VoiceAction1.Voice_Magic_Elements
You can see them in SDK App or the Word Viewer.
VoiceAction then goes to see which EGN file is in path (or applies default) and applies the character codes that are available for those Engine Categories. thus it CONSTRUCTS a string. As an example it may construct for the six categories.../vv/i/nn/oo/eu/ss/ for Windows
This string is available on .....VoiceAction1.Voice_Word
Note that you can skip the other levels and use this low level for recognition.


If I want to construct a Speech Word Processor ?
This is the level for that. Also you will have to prepare Speech Models based on this Engine response.These programs devise a set of English or Foreign Language words that are similar to that regional speech type used.You can do the same for your own words in your own language. Which of the words out of the set might have been
spoken are seen and Artificial Intelligence Model is applied to it. This model confirms one word out of many upon seeing the previous words that were spoken.

What if  my context sensitive vocabulary of words is large?

Of course if you have extremely large no. of words that you want to recognize at the same time or if there is a condition in your programming such that the user may say any word out of one thousand then developing these models become a necessity.But before doing this recheck again if you really need this.We repeat recheck a trillion
times if you really need this. ????? Surprised ! Why don't you break them to 100 words and ask the user " which section does your word belong ???" See... this is how you go on reducing the work of huge tedious models and come down to 5 to 10 words which can recognized by simple pattern matching as given in the next levels.



LEVEL 2

 

Simple pattern matching can do our job of speech interactivity also with use of good dialogue techniques you can disable substitution and avoid risk of mistakes( which are more from these Speech Models.)
Now this produced Voice_Word can be matched to a number of patterns that you propose in this level itself you can do this for two purposes...
1. to pass only particular pattern of strings to the last Level of the Language Editor.
VoiceAction1.Words_I_Want 1, "s*s"
VoiceAction1.Words_I_Want 2, "a*th"
VoiceAction1.Words_to_Scan 1, 2
2. direct matching strings like that of Language Editor and checking which one got hooked.
You can do this using ...
Dim YourChecker As integer
YourChecker = VoiceAction1.Words_I_Check
YourChecker = Voice_Word_Matched
Try this using the SDK App.
YOU CAN SKIP this level using ....the wildcard like this
VoiceAction1.Words_I_Want 1, "*"
VoiceAction1.Words_to_Scan 1, 1

LEVEL 3

Whatever is passed by the above Second Stage comes to the Language Editor Department. Here VoiceAction scans the Lang file kept in its path for pattern matching protocols you had entered in the file
It not only matches the strings but also takes the ranking as per the length and rating you had input.
As an example if your Engine gives ..."aaaaaaaaad" Then a protocol like "aaa*" will be generate a hit.

In this way using the base of the Engine Category a text conversion can be generated using any or all 3 levels.